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Abstract 

The article contents suggestions on how to perform the Fast Fourier 
Transform over Large Finite Fields. The technique is to use the fact 
that the multiplicative groups of specific prime fields are surprisingly 
composite. 

1 Introduction 

In 2003 Gao published the article A New Algorithm for Decoding Reed - 
Solomon Codes [lj. Gaos algorithm can be executed through out with the 
use of the Discrete Fourier Transform (DFT) in a Finite Field. The algorithm 
will of course be much faster using the Fast Fourier Transformation (FFT). 
The coding and decoding of Reed-Solomon Codes is often performed over 
Finite Fields ¥ q of order q = 2 m , where m G N. In 2006 and 2007 Truong, 
Chen, Wang, Chang & Reed published the article |2], [3] : Fast prime factor, 
discrete Fourier algorithms over GF(2 m ), for 8 < m < 10, which is is a 
sort of follow up on another article |1] that treats the cases n = 4,5,6,8. 
These results are very important, but 2 10 = 1024, and for instance digital 



^ISC 11, 42, 68 and 94. Keywords: Finite Fields, Discrete Fourier Transform (DFT), 
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TV signals use much bigger files, say in the order of 2 20 « 10 6 . For large 
files like this my suggestion is to augment the file with an extra bit, and let 
the FFT be performed over a prime field ¥ p . For certain primes this can be 
performed efficient by an algorithm based on the Cooley-Tukey algorithm 
[5] from 1965. In the next section the algorithm will be explained, and the 
last section contents a variety of suggestions on well-suited primes p, where 
the FFT over F p will be specially efficient. 

2 Fast Fourier Transformation over Finite Fields 

Definition 1. Let u) be an element in F p m of order n where n \ p m — 1. The 
Discrete Fourier Transform (DFT) of the n-tuple v = (vq, V\, f n -i) £ F" m 
is the n-tuple V_ with components given by 

n-1 

V$ = XyX j = 0,l,...,n-l (1) 

i=0 

The Inverse Discrete Fourier Transform (IDFT) of the n-tuple V_ £ F™ m 
is the n-tuple 



((n^mod p)^2co- ij Vj), i = 0,1,..., n- 1 



n-1 

[Vi) = ((n _1 mod p) 

i=0 

For a proof see e.g. j6]. Notice that the IDFT apart from the factor 
(n _1 mod p) also is a DFT. 

Now assume that n | p m — 1 is composite: n = rir 2 . The indices in 
definition [T] can be rewritten like this: 

j = jiri+jo, jo = 0, 1, . . . ,n - 1, , ji = 0, 1, . . . ,r 2 - 1 



i = hr2 + io, io = 0, 1, . . . ,r 2 - 1, , i\ = 0, 1, . . . , n - 1 
Replacing by x (i), equation ([TJ now can be rewritten as: 
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r%— 1 r\ — \ 

vuijo) = ^(^x (Moy irw v w ' 

io=0 ii 

Since u n = u rir2 = 1, then u hr2j = u hr2jo . 
Set 

n— 1 

ZlOMo) = ^ X (2l,2o)w Jonr2 
»i=0 

Then 

r2 — 1 

^(ji,io) = ^^i(jo^o)^ ir2j > (im+io)io 

«o=0 

It will require nri multiplications and n(r\ — 1) additions to calculate x\ 
for all (jO)^o) an d nr i multiplications and n(r 2 — 1) additions to calculate V 
from x%. This will give a total of n(ri + r 2 ) multiplications and n(r\ + r 2 — 2) 
additions in F p m. For n > 4 this is faster than the DFT which requires n 2 
multiplications and n(n — 1) additions in ¥ pm . 

More generally, if n = r x r 2 • • • r s where r l5 r 2 , . . . , r s G N, then the indices 
j and i can be expressed like this: 

j = j s -inr 2 ■ ■ ■ r s _i + j s - 2 rir 2 ■ ■ ■ r s _ 2 + . . . + jin + j 

where 

jk-l = 0, 1, . . . ,7-fc - 1, 1 < k < S 

and 

i = i s -ir 2 r 3 ■ ■ ■ r s + i s _ 2 r 3 r 4 ■■■r s + ... + i x r s + i 

where 

= 0, 1, . . . ,r s _(jfc_i) - 1, 1 < k < s 

Now equation (JTJ can be rewritten [7], by setting t>j = Xo(i a -i,i B -2, • • • ,h,io) 
and V,- = V(j a -i,j a - 2 , ■ ■ ■ ,ji,jo), as 
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T a — \T S -X — 1 ri— 1 



V(,7 a _i, j s -2,---,ji,jo) = Y '" &o(**-i,**-2 } --.,ii,io)w 

io=0 ii=0 i s _i=0 

Using the fact that a/ ir2 "' rs = u n = 1 this expression can be calculated 
by s recursive equations: 

n-l 

aci(jo,*«-2,*«-3, *o) = x o(is-i,«s-2, • ■ ■ ,ii,i V J0Wir2 '" rs (2) 

i 3 _i=0 

ZfcUOjili ■ ■ ■ j Jfc-l) V-Jfe-lj ■ • • i*l 5 *o) = 

ffc-1 

^ Xk-tUoJl, ■ ■ -Jk-2, is-k, ... ^1, i ) W ^-^^"^-i+^-^^"^->+-"+^) < --^+i»-*+ a "-r. 
i s -k=0 

for k = 2, 3, . . . , s — 1 

r s -l 

.rJ./o,/i. • • • , = ^ ,C/o,/:. • • • , j a -2 J io)t>- irira "- r - +-+^° 

io=0 

Now the final output x a (j ,ji, . . ■ , j s -i) = ^(j s -i, j s -2, • • • , ji, jo) = ^- 
This algorithm will require a total of n(ri + r 2 + ■ ■ • + r s ) multiplications and 
n{r\ + r 2 + ■ • • + r s — s) additions in F p m. Here we did include the multipli- 
cations by u° = 1. 

For n = r v , the algorithm requires a total of nvr = log r ^ n log 2 (w) multip- 

likations. The factor ; — Vr achieves its minimum for r = 3, but r = 2 and 

log 2 (r) 

r = 4 is still better because of the possibility of reducing the numbers of 
multiplications using: 

Lemma 2. Let u G F p ™ be of order n. If n is even and t 6 Z , then 
-+t t 

Proof. The order of uj is n \ p m — 1. Hence u n = 1. So = w™ - 1 = 
(w'a — l)(wa + 1). Since ord(co>) = n then u% ^ 1 and hence o>§ + 1 = □ 

For n = 2 M use of the lemma will reduce the number of multiplications in 
F 2 m by 50%. This is for s > 3 caused by the possibility of rearranging the 
recursive equations (j2J) in a slightly different way (in principle due to [8]): 
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n— l 

yiUo,is-2,is-3,...,h,io) = ( X] ^o(i i -i,i.-2,...,ii,io)^' oi '- l(n/ri V' 0< *- ar3 "" r * 

i s -i=0 

(3) 

VkUoJi, ■ ■ ■ ,jk-i,i s -k-i, ■ ■ ■ ,*i,«o) = 

* s -fe=0 

for k — 2, 3, . . . , s — 1 when s > 4, else go to the next equation. 

Vs-i(jo,ji, ■ ■ -,js-2,io) = 

r s -i-l 

( J] y s - 2 Oo, ji, • • . ,j s -3^i^o)w 0s - 2n(ri/rs - l) V 0s - 2rir2 - r3 - 2+ - +Jiri+Jo)JO 

ii=0 

r s -l 



,Os-i»o(nA 3 )) 

ys-iVJU) jii • • • ijs—j.1 "u;^ 
«o=0 

Here the final output y s (jo,ji, ■ ■ ■ , js-i) = V(j s - 1: j s -2, • • • , ji, jo) = Vj. 



For £ — 1,2,- • • s, arr point Fourier Transform is included in step num- 
ber £ of the algorithm. Among these, each two point Fourier Transform does 
not require any multiplication because u° = 1 and u}% — —uj. 

Within the original Cooley - Tukey algorithm, which is executed over the 
field of complex numbers, it is possible to do additional tricks by looking at 
the real and imaginary part of a number. These tricks can not be transferred 
to a finite field. 

The overall conclusion must be that the algorithm sketched above will be 
relatively most efficient if the total number of points n = r^ 1 r^ 2 ■ ■ ■ r^ u is 
factored in factors as small as possible. 
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3 Concrete suggestions 



Versions of the Cooley - Tukey algorithm are not very efficient over bi- 
nary fields F 2 ™ where m G N. In most of these cases the order 2 m — 1 
of the multiplicative group is not highly composite. For instance the order 
of the fields examined in the recent article [2], [3] are 2 8 ,2 9 and 2 10 where 
2 8 - 1 = 3 x 5 x 17, 2 9 - 1 = 7 x 73 and 2 10 - 1 = 3 x 11 x 31 It 
could also be mentioned that 2 7 — 1 is a prime. The algorithm presented 
in |2J, [3] is a Prime Factor Algorithm which as such takes advantage of 
the fact that all the prime factors in 2 m — 1 are coprimes for 8 < n < 10. 
This will also be the case for a great deal of the numbers 2 m — 1 for big- 
ger m, but some of the prime factors tend to be bigger too. For example 
2 15 - 1 = 7 x 31 x 151, 2 16 - 1 = 3 x 5 x 17 x 257, 2 17 - 1 is prime, 
2 18 - 1 = 3 3 x 7 x 19 x 73 and 2 19 - 1 is prime. 

It is obvious, as mentioned, that versions of the Cooley - Tukey algorithm 
will not be very efficient in finite fields like these. It will in these cases be 
much more efficient to avoid using all m bits in its full content, and use al- 
gorithm (J2J) or ([2]) over a prime field instead. Here are two examples: 

Instead of using 17 bits to create the field F 2 i7 with a multiplicative af 
order 131071 which is a prime, then use 18 bits to create the prime field 
F 14 7457 which multiplicative group is of the order 2 14 x 3 2 . 
Or instead of using 19 bits to create the field F 2 i<j with a multiplicative group 
of the order 524287 which is also a prime, then use an extra bit to create the 
field F786433 which multiplicative group is of the order 2 18 x 3. 
The orders of the multiplicative groups of the prime fields given in the two 
examples are highly composite, and the algorithm ([3]), which is based on the 
Cooley - Tukey algorithm, will be very efficient here: In the case F147457, 
DFT uses (2 14 x 3 2 ) 2 « 2 x 10 10 multiplications and the FFT suggested 
here will require 2 14 x 3 2 x (14 xl + 2x3) ^3x 10 6 multiplications, which is 
li^t^otq ~ 7 x 10 3 times faster than the DFT. Here we have used lemma [2] 
to reduce the number of multiplications. The multiplication in it self is also 
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easy: Just multiplication modulo the prime, which in the example is 147457. 
The estimate is roughly the same as regards the additions: The DFT over 
F 147 457 requires 147457 x (147457- 1) « 2 x 10 10 additions and our FFT (J3J 
requires 147457 x (14 x 2 + 2 x 3 - (14 + 2)) = 3 x 10 6 additions, which is 

147457-1 ^ o v in3 



8 x lCr times better. 



14x2+2x3-(14+2) 

In the second example F7 864 33, our FFT ([3]) will perform the multiplications 
ilfxi+3 ~ 4 x 10 4 times faster than the DFT. And the additions will similarly 
be performed 18X 2+3_(78+i) ~ 4 x 10 4 faster than the DFT. 

A FFT calculated in for instance 1 second, would then take roughly 10 
hours as a DFT. 



4 The elements of order n 

In our FFT ([3]) over ¥ p an element u of order n | p — 1 appears. Usually we 
will choose n — p — 1, and then u will be a generator of F p . Such a generator 
will normally be easy to find: according to Lagrange's theorem in a finite 
group the order of any element will be a divisor in the order of the group. 
Therefore an element a is a generator of the multiplicative subgroup of ¥ p 
with n elements iff 

a n/r y m od p for every prime factor r of n. 

A probabilistic algorithm to determine the smallest possible generator of 
the multiplicative subgroup of ¥ p with n elements goes like this: 

Algorithm 3. 

Input: n | p — 1 

1. Prime factorize n 

2. Choose the smallest integer a from the set {2, 3, ... , n} 

3. For every primej "actor r of n calculate a n ^ r . 
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4- If this quantity is different from 1 for all prime] "actors r ofn, then stop. 
Else repeat step 2 and 3 for the lowest values of a & {2, 3, . . . , n} until 
this happens. Then stop. 

output: The last value of a. 

Comments on the algorithm: For practical puposes n < 2 30 , and 
then the prime factorization of n will not be computationally difficult. The 
algorithm will allways find a generator u, as we know that it exists. If the 
prime factorization of n is n = p V \p V 2 ■ ■ ■ p^ u then the numbers of generators of 
the multiplicative subgroup of F p will be ip(n) = n(l — i)(l — . . . (1 — — ) 
(see e.g. [H]). Hence the possibility of a random a G {2,3, . . . , n} being a 
generator u equals (1 — ^-)(1 — —)... (1 — -). In both the earlier mentioned 
examples F147457 and F786433 this probability is (1 — |)(1 — <$) = <$ 

5 A list of suitable choices of primes p 

At the end of this article I will print a list of primes 2 16 < p < 2 21 where the 
only primefactors in p — 1 are 2 and 3. For these primes the FFT algorithms 
(T5]) and ([3]) treated here will be especially efficient. For n = p — 1 = 2 U1 3 U2 the 
number of multiplications in algorithm (T5]) will be (p—l)(2vi + 3v2) which can 
be reduced to (p— l)(^i + 3f 2 ) using algorithm 02]). The number of additions 
will for both FFT algorithms be (p— l)(2z/i+3z/2~ (^1+^2)) = (p— 1) (^1+2^2) • 
As we see, the table starts with the biggest Fermat - number 2 2 " + 1 
known to be a prime. For nearly half of the shown numbers p-1 a generator 
u of F p is 5 = 2 2 + 1, a nice number to multiply with in base 2. 
The factoring was implemented with the math - program Maple on my mo- 
bile PC. 



8 



Prime p 


Factorization of p — 1 


Generator uo 


65537 


2 16 


3 


139969 


2 6 x 3 7 


13 


147457 


2 14 x 3 2 


10 


209953 


2 5 x 3 8 


10 


331777 


2 12 x 3 4 


5 


472393 


2 3 x 3 10 


5 


629857 


2 5 x 3 9 


5 


746497 


2 10 x 3 6 


5 


786433 


2 18 x 3 


10 


839809 


2 7 x 3 8 


7 


995329 


2 12 x 3 5 


7 


1179649 


2 17 x 3 2 


19 


1492993 


2 11 x 3 6 


7 


1769473 


2 16 x 3 3 


5 


1990657 


2 13 x 3 5 
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Perspective: 

The number of simple factorizations like those above seems not to stop when 
the primes grow even bigger. Here are two examples: 

For the prime p = 113246209 the factorization of p — 1 is 2 22 x 3 3 . 

For the prime p = 725594113 the factorization of p — 1 is 2 12 x 3 11 . 
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